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Background: We conducted a genome-wide association study (GWAS) to identify specific 
genetic variants that underlie susceptibility to diseases caused by Staphylococcus aureus 
in humans. 

Methods: Cases (n = 309) and controls (n = 2925) were genotyped at 508,921 single 
nucleotide polymorphisms (SNPs). Cases had at least one laboratory and clinician 
confirmed disease caused by S. aureus whereas controls did not. R-package (for 
SNP association), EIGENSOFT (to estimate and adjust for population stratification) and 
gene- (VEGAS) and pathway-based (DAVID, PANTHER, and Ingenuity Pathway Analysis) 
analyses were performed. 

Results: No SNP reached genome-wide significance. Four SNPs exceeded the p < 10~ 5 
threshold including two (rs2455012 and rs7152530) reaching a p-value < 10~ 7 . The nearby 
genes were PDE4B (rs2455012), TXNRD2 (rs3804047), VRK1 and BCL11B (rs7152530), 
and PNPLA5 (rs470093). The top two findings from the gene-based analysis were NMRK2 
(Pgene = 1.20E-05), which codes an integrin binding molecule (focal adhesion), and DAPK3 
(Pgene = 5.10E-05), a serine/threonine kinase (apoptosis and cytokinesis). The pathway 
analyses identified epithelial cell responses to mechanical and non-mechanical stress. 

Conclusion: We identified potential susceptibility genes for S. aureus diseases in this 
preliminary study but confirmation by other studies is needed. The observed associations 
could be relevant given the complexity of S. aureus as a pathogen and its ability to exploit 
multiple biological pathways to cause infections in humans. 
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INTRODUCTION 

Staphylococcus aureus is a complex human pathogen due to its 
ability to survive both as a carriage organism, and behave as 
an opportunistic pathogen in a susceptible host. It is a leading 
cause of invasive bacterial infection that contributes to substan- 
tial morbidity and mortality worldwide. This bacterium can cause 
a variety of diseases ranging from mild to severe skin and soft 
tissue infections, keratitis, and osteomyelitis to life-threatening 
bacteremia, pneumonia, endocarditis, and sepsis (Lowy, 1998; 
Rehm, 2008). Even though S. aureus colonizes human anterior 
nares and other sites on skin in 30-50% of the general popu- 
lation (Graham et al., 2006; Kuehnert et al., 2006; Gordon and 
Lowy, 2008), not everyone who is colonized gets infected. One 
reason could be differences in genetic susceptibility to coloniza- 
tion and infections. The established role of host susceptibility in 
other infectious diseases lends support for a role of host genet- 
ics in S. aureus infection (de Bakker and Telenti, 2010). Genetic 
susceptibility to S. aureus infections is expected to be complex 
because this pathogen uses a wide variety of virulence factors that 
interact with several host pathways to cause disease in humans. 



Numerous alleles segregating at a large number of loci 
contribute to complex disease susceptibility (Yang et al., 
2011) with contributions from both common and rare 
alleles. Because genetic risk factors for S. aureus infections 
have not been previously studied on a genome-wide scale, 
we utilized the Personalized Medicine Research Project 
(PMRP) of Marshfield Clinic — a large, population-based 
biobank of DNA samples — to perform a preliminary genome- 
wide association study (GWAS) of laboratory-confirmed 
S. aureus infections to discover the underlying host-pathogen 
interactions. 

MATERIALS AND METHODS 
SUBJECTS 

This study utilized Marshfield Clinic's PMRP biobank, a 
cohort of ~20,000 individuals from 14 Zip Codes surround- 
ing Marshfield, Wisconsin, USA. DNA samples from 3234 
PMRP participants were genotyped at > 500,000 SNPs as part 
of the NHGRI/NIGMS-funded eMERGE network (McCarty 
et al., 2011). These genotypic data are linked to longitudinal 
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electronic medical records (EMR) at the Marshfield Clinic 
and have served as a powerful resource in previous stud- 
ies (McCarty et al., 2011; Cross et al., 2012; Hebbring 
et al., 2013). The population of PMRP is stable and highly 
homogeneous with a predominant Northern European genetic 
background and so carries lower risk of confounding by pop- 
ulation stratification. All individuals in the study provided 
informed consent and the study was approved by the Marshfield 
Clinic IRB. 

STUDY DESIGN 

This study examined host susceptibility genes for S. aureus infec- 
tion regardless of S. aureus colonization status. We performed a 
case/control GWAS on a subgroup of the PMRP population: 3234 
subjects (309 cases/2925 controls). All study subjects were over 
49 years of age. Cases were defined as individuals who had at 
least one laboratory confirmed test in their medical record of a 
disease caused by S. aureus. Controls were subjects who did not 
have any evidence of infection due to S. aureus in their EMR. 
We reasoned that because cases had to have evidence of labora- 
tory confirmed S. aureus infection in the EMR, patients with no 
evidence of infection in the EMR would form an ideal control 
group. The results using our control group are likely very similar 
to those that would be obtained using population-based con- 
trols, as is often employed in GWAS studies (Burton et al., 2007). 
Only subjects with self-identified Northern European ancestry 
were included; therefore, both cases and controls had the same 
genetic ancestry. Additionally, a principal components analysis 
of their genome-wide genotypes in all study subjects (without 
knowledge of case/control status) was performed, revealing no 
evidence of population substructure (Supplemental Figure 1). 
With this filtered set of case/control subjects, we performed 
three levels of statistical analysis: SNP-based, Gene-based, and 
Pathway-based, to identify novel polymorphisms, genes, and 
pathways involved in susceptibility to S. aureus diseases. This 
hierarchical investigation of genetic effects allows for the incor- 
poration of a gradation of biological information into the statis- 
tical tests — the SNP-based scan is the most comprehensive and 
agnostic to prior biological knowledge, the gene-based analysis 
uses positional information and collapses effects from the same 
protein-coding region, and the pathway-based analysis incor- 
porates information obtained from various molecular biology 
studies. 

PHEN0TYPES 

We extracted demographic and medical information on all 
case/control subjects from the Marshfield Clinic EMR. All cases 
had an active infection that had yielded S. aureus as the major or 
the only bacterium on a culture plate from a clinical sample such 
as blood, sputum, etc. Hospital surveillance subjects who were 
positive for S. aureus colonization by PCR were excluded from 
the study. Age, sex, body mass index (BMI), and Type 2 Diabetes 
(T2D) status as determined from the EMR were tested for associ- 
ation with case/control status. The binary variables were analyzed 
using a Fisher's exact test, and the continuous variables were ana- 
lyzed through a two-tailed T-test to test mean differences between 
cases and controls. 



GEN0TYPIC DATA 

The Illumina 660W-QUAD Beadchip array (Illumina, San Diego, 
California, USA) was used to generate genotype data on over 
500 K SNPs from cases and controls. To ease automation of anal- 
ysis, only data obtained from autosomes were analyzed. After 
filtering out SNPs with low minor allele frequency ( <0.01), miss- 
ing genotype data (> 0.05 of the study population with missing 
genotypes) or a significant departure from Hardy- Weinberg equi- 
librium (HWE) (p < 1E-5), there were 508,921 SNPs that passed 
the quality control screening and were used in subsequent analy- 
sis. For a sample to be included in the analysis, we ensured that it 
had at least 99% of the non-missing SNPs and for each SNP to be 
included in the analysis, we required that the SNP should have at 
least 95% of the non-missing subjects. 

POPULATION STRATIFICATION ESTIMATION USING EIGENS0FT 

We used the EIGENSOFT program (Price et al., 2006) to examine 
population stratification in our dataset. The program combines 
population genetics methods and the use of PCA to explicitly 
capture ancestry differences between cases and controls along 
continuous axes of variation. 

SNP-BASED ANALYSIS USING PUNK AND R-PACKAGE 

PLINK (Purcell et al., 2007), a whole genome association analysis 
tool set, was used to perform the filtering and QC procedures (as 
described under "Genotypic data" above) on the raw dataset to 
generate the data set employed in this study. R is a free software 
programming language and software environment for statistical 
computing and graphics (R. Core Team, 2013). The glm() func- 
tion within R was used to establish the logistic model assuming 
an additive mode of inheritance with adjustment for the follow- 
ing risk factors: age, gender, BMI, diabetes and the top three 
principal components estimated from EIGENSOFT for popula- 
tion stratification. Thep-values calculated from the logistic model 
were used to test the associations between SNPs and case/control 
phenotype. 

GENE-BASED ANALYSIS USING VEGAS 

VEGAS is a software program that tests association between a 
gene and phenotype trait. VEGAS uses SNP-level data to incorpo- 
rate information from a full set of markers annotated to each gene 
and accounts for linkage disequilibrium (LD) between markers 
(Neale and Sham, 2004; Liu et al, 2010). All SNPs annotated to a 
gene were used in the calculation, and the method adjusted by the 
linkage disequilibrium structure by using HapMap data. Monte 
Carlo simulations from multivariate Gaussian random variables 
and Cholesky decomposition matrices were employed by VEGAS 
to produce disease association p-values per gene, correcting for 
the correlation structure between nearby SNPs. This type of anal- 
ysis has several advantages including the collapsing of effects for 
all genotyped SNPs within each gene, and reducing the multiple 
testing burdens. The LD structure of each gene region is fac- 
tored into the analysis through applying decomposition matrices 
in the analysis, effectively factoring-out the correlational struc- 
ture between tightly linked SNPs. Additionally, gene-based results 
enable the subsequent use of many pathway analysis packages 
designed to use a single measurement from each gene. 
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PATHWAY-BASED ANALYSIS BY DAVID, PANTHER, AND INGENUITY 
PATHWAY ANALYSIS (IPA) PROGRAMS 

All genes with a p-value = 0.01 from the VEGAS analysis were 
used as input to perform separate pathway analysis by DAVID, 
PANTHER, and IPA. The DAVID analyses consisted of three 
steps: measurement of the functional relationship of gene pairs, 
a DAVID agglomeration procedure to partition genes into func- 
tional gene groups, and visualization of the results (Huang et al., 
2007, 2009a,b). PANTHER is a publically-available database hav- 
ing gene ontology, functional annotation, and evolutionary con- 
servation information. PANTHER Pathways are available within 
the database and these pathways were used in our analysis. We 
calculated p-values for enrichment of S. aureus-associated genes 
within specific PANTHER pathways using a standard hypergeo- 
metric statistical approach. The statistical over- representation test 
was implemented in the PANTHER program. A binomial test was 
used to compare our gene list to a reference list (all human genes) 
to determine over- or under- representation of genes from our 
list in PANTHER gene function categories using an experiment- 
wise approach (experiment-wise a = 0.05) (Mi and Thomas, 
2009). Gene-based analyses using IPA (Ingenuity® Systems, 
www.ingenuity.com, Redwood City, California, USA, www. 
ingenuity.com) were used to generate protein-protein interaction 
networks. 

RESULTS 

POPULATION STRATIFICATION 

Using EIGENSOFT to perform PCA of the 508,921 genotypes 
from all study subjects (without knowledge of case/control sta- 
tus), we found no evidence of strong population stratification 
(Supplemental Figure 1). Therefore none of the 3234 study sub- 
jects were excluded because of population stratification. 

STUDY SUBJECT CHARACTERISTICS 

It has been previously noted that male sex, age, high BMI and 
T2D are risk factors for invasive S. aureus infection (Graffunder 
and Venezia, 2002). As expected, the percent of males was signifi- 
cantly greater in the cases than the control group (51 vs. 39%; p = 
5.50E-05) (Table 1). There was no significant difference in mean 
age of case and controls. The prevalence of T2D was significantly 
higher among cases than controls (22.3 vs. 12.7%; p = 1.03E-05) 
and so was the mean BMI (cases: 32.1 vs. control: 29.5 kg/m 2 ; 



Table 1 | Demographic and other phenotypic characteristics. 
Characteristic Cases Controls p-value* 



Cases 
(N = 309) 



Controls 
(N = 2925) 



Age (years) (mean ± SD) 
Males (W, %) 
Type 2 diabetes (N, %) 
Body mass index (kg/m 2 ) 
(mean ± SD) 



74.5 ± 10.9 
159 (51.5) 
69 (22.3) 
32.1 ± 8.0 



73.5 (10.8) 
1153 (39.4) 
370 (12.7) 
29.5 ± 6.1 



0.13 
5.50E-05 
1.03E-05 
3.50E-08 



"Two-tailed t-test assuming unequal variances used to compare continuous 
variables (age and body mass index) and Fisher's exact test used to compare 
categorical variables (% males and % with type 2 diabetes). 



p = 3.50E-8). Our findings are consistent with those of previous 
reports. 

Q-Q PLOT 

We performed Q-Q analysis on the p-values obtained using logis- 
tic model assuming additive model of inheritance (Figure 1). The 
plot showed no evidence of population stratification, confound- 
ing effects, or systematic bias in the results from the statistical 
routines employed. 

HWE 

Eight hundred forty three SNPs were excluded from the analysis 
due to departure from HWE exceeding a = 1E-05. 

SNP-BASED ANALYSES 

SNP associations were tested using logistic regression analysis 
after adjusting for risk factors such as age, gender, BMI, diabetes 
and three principal components. No single SNP in the GWAS 
reached the level of genome-wide significance (p < 5 x 10~ 8 ) 
(Figure 2). However, four SNPs exceeded thep < 10~ 5 threshold, 
including two SNPs (rs2455012 and rs7152530) with a p < 10~ 7 
(Table 2). Out of these four SNPs, two were intronic (PDE4B and 
TXNRD2), one was intergenic with respect to VRK1 and BCL1 IB, 
and one was in the 3'UTR of PNPLA5. Of the four SNPs on 
chromosome 14 (Table 2), rsl892234 was in weak linkage dise- 
quilibrium (LD) with the other three SNPs (r 2 < 0.42) which were 
in strong LD with each other (r 2 = 0.80). The two SNPs in XRN1 
were in strong LD (r 2 = 0.92), two of the SNPs on chromosome 
22 (rs470093 and rs9614174) were in moderate LD (r 2 = 0.49), 
and the two SNPs on chromosome 19 exhibited low LD (r 2 = 
0.09). 

VEGAS-BASED GENE ANALYSES 

Table 3 shows the top 15 genes ranked by their p- values from the 
VEGAS analysis and four additional interesting genes that could 
potentially have a role in S. aureus-caused diseases based on their 
known involvement in immune and inflammatory processes. The 
topmost hit was NMRK2 (or ITGB1BP3; p = 1.20E-05) which 
encodes nicotinamide riboside kinase 2. Two of the 15 genes 
also featured in the list from the SNP-based analysis: DAPK3 
(p = 5.10E-05) andXRNl {p = 1.85E-04). 

PATHWAY ANALYSES 

The top-ranked 196 genes (p-value = 0.01) resulting from the 
VEGAS analysis were selected for the DAVID analyses but none 
of the gene groups were statistically significant. One of the top 
gene groups (gene group 1; Supplemental Table 1) included CST8 
(cystatin 8), SERPINA6 (serine peptidase inhibitor, clade A, mem- 
ber 6), SERPINA10 (serine peptidase inhibitor, clade A, member 
10), and SPINK1 (serine peptidase inhibitor, Kazak type 1). The 
enriched genes in group 2 (Supplemental Table 1) included four 
keratin genes: KRT24 (form intermediate filament), KRT82 (type 
II hair keratin), KRT12 (type I intermediate filament keratin 12), 
and KRT75 (form intermediate filament in in the cytoplasm of 
epithelial cells). As expected, the PANTHER analysis also sug- 
gested enrichment for intermediate filament cytoskeleton path- 
way (Supplemental Table 2). Using the same gene set input as in 
DAVID and PANTHER, the IPA was also explored with the intent 
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Expected -log, 0 (p) 

FIGURE 1 | The Q-Q plot of the p-values from all 508,921 SNPs. The x-axis shows the expected — log-jo(p-value). The y-axis shows the 
observed -logio(p-value). 



pi I, 4 j .. .. J ' V 





FIGURE 2 | The manhattan plot of the p-values from all 508,921 SNPs. The x-axis shows the chromosome numbers. The y-axis is the -log^ (p-value). The 
blue line is p-value of 10~ 5 whereas the red line shows the genome wide significance p-value of 5 x 10~ 8 . 



of hypothesis-generating protein-protein interaction networks 
associated our gene data set. The IPA yielded 12 protein networks 
(data not shown) of which network 7, involving cell death, and 
survival, appeared interesting (Supplemental Figure 2). 

DISCUSSION 

Multiple diseases (endocarditis, skin infections, etc.) caused by 
S. aureus are mediated by two main classes of virulence factors, 
adhesion and secretory proteins, which interact with host recep- 
tors and inflammatory/ anti-inflammatory molecules to produce 
the disease phenotype (Gordon and Lowy, 2008). Adhesion pro- 
teins help the pathogen to attach to the skin and survive on 



the epidermis and in the sub-epidermal layer through a reper- 
toire of molecules collectively known as MSCRAMMS (microbial 
surface components recognizing adhesive matrix molecules). The 
MSCRAMMS can bind to fibronectin, fibrinogen, and platelets 
among others. Subsequent to attachment, S. aureus can secrete 
tissue and organ-specific virulence proteins (e.g., coagulase, pro- 
teases, toxins, superantigens) with a wide range of virulence 
functions that enable the pathogen to infect its host. 

So far, most of the genetic susceptibility data related to S. 
aureus has been limited to S. aureus colonization. For example, 
the glucocorticoid receptor gene polymorphisms are associated 
with carriage risk (van den Akker et al., 2006), whereas DEFB1 has 
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Table 2 | Top 20 SNPs, their chromosomal locations, associated genes, major and minor alleles, minor allele frequency of cases and controls, 
p-value and odds ratio and 95% confidence interval. 

db SNP ID Chr Position Nearby genes Variant Major/minor Case control P-value Odds ratio 

(hg19) location alleles MAP MAF (95% CI) 
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been shown to promote persistent colonization by modulating 
beta-defensin expression in keratinocytes (Nurjadi et al., 2013). 
In contrast, the Danish middle-aged/elderly twin study showed 
that host genetics had a modest influence only on S. aureus car- 
rier state (Andersen et al., 2012). While our manuscript was 
under review, a study by Nelson et al did not find any SNP 
that has genome wide significance for association with S. aureus 
bacteremia (SAB) although an intronic SNP in CDON was specu- 
lated to be associated with complicated SAB (Nelson et al., 2014). 
In a report by Stappers et al., four SNPs from three toll-like 
receptor (TLR) genes, TLR1, TLR2, and TLR6, increase the sus- 
ceptibility to complicated skin and soft tissue infections caused 
by staphylococci, streptococci, and enterococci (Stappers et al., 
2014). 

In our multi-tiered GWAS-based investigation, we have 
identified a number of potentially interesting genes that 
need further investigation. Although the individual SNP 
results did not pass genome-wide significance, the top-tier 
SNPs, Gene-based, and Pathway-based results were enriched 
for genes that have plausible functions in bacterial infec- 
tions. This includes genes that have roles in intracellular 
signaling, inflammation, zinc transport, and integrin bind- 
ing. Thus, these genes have relatively high prior probabil- 
ities for involvement in S. aureus infection susceptibility — 
an aspect of the results that we believe supports consid- 
erable interest in subsequent studies to follow-up on these 
findings. 

Notably, two genes (DAPK3 and XRN1) were identified in 
both the SNP-based and gene-based analyses. DAPK3 is a protein 



kinase that modulates apoptosis-related signaling pathways (Wu 
et al., 2010) and, in interaction with RhoD, modulates actin 
filament assembly and focal adhesion reorganization (Nehru 
et al, 2013). S. aureus is known to induce apoptosis of host 
cells during host invasion, leading to a compromised host 
immune response (Haslinger-Loffler et al., 2005). XRN1 encodes 
a 5'— 3' exonuclease family member involved in cellular mRNA 
turnover (Nagarajan et al., 2013). The gene is shown to com- 
plete host mRNA degradation initiated by viral pathogens 
(Gaglia et al., 2012). These functions suggest possible roles 
for DAPK3 and XRN1 in susceptibility to diseases caused by 
S. aureus. 

Some of the other genes we identified have been implicated 
in infectious disease processes. PDE4B is involved in modulating 
bacteria-induced inflammation (Komatsu et al., 2013). PNPLA5 
appears to be critical for autophagosome functions, including 
microbial clearance (Dupont et al., 2014). Mammalian hosts are 
known to reduce the level of free zinc to thwart pathogen growth 
(Kehl-Fie and Skaar, 2010) and it is plausible that SLC39A8, a 
zinc transporter, could be involved. BCL1 IB encodes a transcrip- 
tional repressor involved in T-cell development (Wakabayashi 
et al., 2003). NMRK2, an integrin betal binding protein, could 
function in host responses to bacterial function based on the 
finding that host fibronectin forms a bridge between S. aureus 
fibronectin-binding proteins and host cell betal integrins during 
S. aureus cellular invasion (Fowler et al, 2000). Keratin inter- 
mediate filaments are shown to have a protective role during 
infection with Bartonella henselae in cat scratch disease (Zhu et al., 
2013). Interleukin 1 cytokine family members (ILIA, IL1B, IL1R1, 
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Table 3 | Top 15 and four other potentially interesting gene hits from 
VEGAS gene-based analysis. 
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10 


103585731 


103603677 


0.000637 


MGEA5 


6 


10 


103544200 


103578222 


0.000654 
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2 
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0.00322 
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22 


2 
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0.00363 


*IL1R1 


54 


2 
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(168) 

* Ranks of these genes are mentioned in parenthesis. 

and IL1RL2) are known mediators of immune and inflammatory 
responses (Garlanda et al, 2013). 

It is known that many Mendelian and oligogenetic immun- 
odeficiency disorders confer risk to staphylococcal infection 
including lymphocyte deficiencies such as severe combined 
immunodeficiency, chronic granulomatous disease, and hyper- 
IgE syndrome (Stephan et al., 1993; Grimbacher et al., 1999; Van 
de Vosse et al., 2009). These disorders are typified by highly dis- 
ruptive mutations occurring in genes central to lymphoid cell 
competency including STAT3, JAK3, DOCK8, and CD18, among 
others (Hogg et al, 1999; Kalman et al, 2004; Jiao et al, 2008; 
Zhang et al, 2009). However, most cases of severe staphylococcal 
infection are not attributable to these more rare conditions and 
have unknown genetic etiology. 

In summary, this preliminary GWAS applied a SNP-to- 
gene-to-disease-pathway approach to identify susceptibility genes 
against a broad umbrella of laboratory confirmed S. aureus infec- 
tions. While no one SNP and gene was found to be highly 
significant in this study, we suspect that for a versatile pathogen 
like S. aureus, that needs to overcome barriers presented by a 
variety of tissues and defense systems to infect various sites in 
the body, there are bound to be several genes involved in host 
susceptibility. Not everyone exposed to a virulent or a colo- 
nizing strain of S. aureus has similar severity of infection. It 
is reasonable to speculate that effects of variants segregating at 
multiple genes contribute to the severity of S. aureus infection. 



Similarly, there could be protective alleles that may lower the risk 
of clinically-attended infection as well. Additional studies will be 
needed to confirm these findings but eventually functional stud- 
ies will be needed to illuminate the detailed mechanisms of how 
these variants confer predisposition to infection. Once consen- 
sus disease loci and pathways are identified, they can serve as 
targets for future pharmaceutical development and further elu- 
cidation of how aberrant cellular processes/signaling give rise to 
Staphylococcus-mduced pathologies. 
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